OCPBUGS-88757: Align HCCO CatalogSource registryPoll interval with OCP defaults#8745
OCPBUGS-88757: Align HCCO CatalogSource registryPoll interval with OCP defaults#8745shijadha wants to merge 1 commit into
Conversation
Update HCCO-managed CatalogSource registryPoll interval from 10m to 240m to match standard OCP 4.18+ defaults set by operator-marketplace. This change: - Reduces control plane resource consumption from catalog polling - Decreases etcd churn and storage growth - Improves scalability for management clusters hosting many HCPs - Aligns HyperShift behavior with upstream OCP (changed in OCPBUGS-69441) Trade-off: Catalog update discovery latency increases from 10 minutes to 4 hours. Users can manually trigger refresh by deleting the CatalogSource pod if needed. Signed-off-by: Shital Jadhav <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@shijadha: This pull request references Jira Issue OCPBUGS-88757, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThe OLM catalog source registry polling interval in Suggested Reviewers
🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shijadha The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @shijadha. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go`:
- Around line 44-45: Add a unit test to verify the 240-minute polling interval
configuration change for the CatalogSource. Create a test function that
instantiates a CatalogSource, calls the function that modifies it (likely
ReconcileRedHatOperatorsCatalogSource based on the context), and then asserts
that both the RawInterval field equals "240m" and the Interval.Duration field
equals 240*time.Minute as shown in the provided example test structure. This
test should be placed in the appropriate test file for the catalogs.go module.
- Around line 44-45: The polling interval changes for the four default catalog
sources (certified, community, marketplace, red-hat-operators) in the
catalogs.go file lack accompanying unit tests. Add unit tests that verify the
RawInterval and Interval fields are correctly configured for each of the four
default catalog sources with the 240-minute polling interval. The tests should
validate that the interval values are properly set and match across both the
string representation and the metav1.Duration object for each catalog source.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 7d0e7fc3-d98e-44a5-ab91-4db377db2305
📒 Files selected for processing (1)
control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go
| RawInterval: "240m", | ||
| Interval: &metav1.Duration{Duration: 240 * time.Minute}, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add unit tests for the interval change.
The coding guidelines require unit tests for code changes in control-plane-operator/**/*.go. While this is a configuration change, it significantly alters the polling behavior (from 10 minutes to 240 minutes). Consider adding a unit test that verifies both RawInterval and Interval fields are set to the expected 240-minute values in the reconciled CatalogSource.
🧪 Example test structure
func TestReconcileCatalogSourceRegistryPollInterval(t *testing.T) {
cs := &operatorsv1alpha1.CatalogSource{}
params := &OperatorLifecycleManagerParams{
RedHatOperatorsImage: "test-image",
OLMCatalogPlacement: hyperv1.GuestOLMCatalogPlacement,
}
ReconcileRedHatOperatorsCatalogSource(cs, params)
require.NotNil(t, cs.Spec.UpdateStrategy)
require.NotNil(t, cs.Spec.UpdateStrategy.RegistryPoll)
assert.Equal(t, "240m", cs.Spec.UpdateStrategy.RegistryPoll.RawInterval)
assert.Equal(t, 240*time.Minute, cs.Spec.UpdateStrategy.RegistryPoll.Interval.Duration)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go`
around lines 44 - 45, Add a unit test to verify the 240-minute polling interval
configuration change for the CatalogSource. Create a test function that
instantiates a CatalogSource, calls the function that modifies it (likely
ReconcileRedHatOperatorsCatalogSource based on the context), and then asserts
that both the RawInterval field equals "240m" and the Interval.Duration field
equals 240*time.Minute as shown in the provided example test structure. This
test should be placed in the appropriate test file for the catalogs.go module.
Source: Coding guidelines
🧩 Analysis chain
🌐 Web query:
operator-framework api RegistryPoll RawInterval field format requirements version 0.37.0
💡 Result:
In the Operator Lifecycle Manager (OLM) API, the registry poll interval for a CatalogSource is configured via the interval field within the registryPoll updateStrategy [1][2]. This field expects a duration string format [3], such as "45m" or "10m" [1][2]. Regarding the RawInterval field: 1. Technical Role: The RawInterval field was introduced as an internal mechanism to handle the parsing of this interval [4][5]. It serves as a raw string representation of the duration before it is unmarshalled into the structured interval used by the OLM controller [4]. 2. Usage: This field is generally not intended for manual configuration by end-users in their YAML manifests [1][2]. Users should continue to use the interval field, which accepts standard duration strings [1][2]. 3. Version 0.37.0 context: The version 0.37.0 release of the operator-framework/api module refers to updates within the OLM API and associated ecosystem [6][7]. While this version includes updates for new fields (such as a new release field for CSVs) [6][7], the fundamental format requirements for catalog polling intervals remain based on standard duration parsing (e.g., Go's time.ParseDuration). In summary, users should specify the polling frequency using the interval field in their CatalogSource YAML using standard duration notation (e.g., 15m, 1h) [1][2]. The RawInterval field is an internal implementation detail and is not required or recommended for end-user manifests [4][5].
Citations:
- 1: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/catalog-polling.md
- 2: https://olm.operatorframework.io/docs/tasks/make-catalog-available-on-cluster/
- 3: https://docs.okd.io/4.6/rest_api/operatorhub_apis/catalogsource-operators-coreos-com-v1alpha1.html
- 4: openshift/operator-framework-olm@0d75704
- 5: openshift/operator-framework-olm@5f79495
- 6: 🌱 Bump github.com/operator-framework/api from 0.36.0 to 0.37.0 operator-framework/operator-controller#2414
- 7: 🌱 Bump github.com/operator-framework/api from 0.36.0 to 0.37.0 operator-framework/operator-lifecycle-manager#3723
🏁 Script executed:
#!/bin/bash
# Description: Search for RegistryPoll usage patterns and validation in the codebase
# Look for existing RegistryPoll configurations to confirm string format patterns
rg -n -C2 'RawInterval.*".*m"' --type=go
# Look for any validation or parsing of RawInterval
rg -n -C3 'RawInterval' --type=go -g '!vendor/' | head -50Repository: openshift/hypershift
Length of output: 1592
Add unit tests for the catalog source polling interval changes.
The RawInterval string format "240m" is valid for Go duration parsing and is correct. However, this change modifies the polling behavior across all four default catalog sources (certified, community, marketplace, red-hat-operators) without accompanying unit tests. Control-plane-operator changes require unit tests per coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go`
around lines 44 - 45, The polling interval changes for the four default catalog
sources (certified, community, marketplace, red-hat-operators) in the
catalogs.go file lack accompanying unit tests. Add unit tests that verify the
RawInterval and Interval fields are correctly configured for each of the four
default catalog sources with the 240-minute polling interval. The tests should
validate that the interval values are properly set and match across both the
string representation and the metav1.Duration object for each catalog source.
Summary
Updates the HCCO-managed CatalogSource
registryPoll.intervalfrom10mto240mto align with standard OCP 4.18+ defaults established by operator-marketplace.Context
In standard OCP clusters (4.18+), the operator-marketplace component sets the default
registryPollinterval to 240 minutes (4 hours) to address performance issues including unbounded etcd growth and high I/O on control plane nodes (see operator-marketplace PR #695 and OCPBUGS-69441).In HyperShift/HCP hosted clusters, the Hosted Cluster Config Operator (HCCO) manages CatalogSources independently with its own reconciliation logic in
catalogs.go, which still hardcodes the interval to 10 minutes. This creates inconsistency between standard OCP and HCP behavior.Changes
RawIntervalfrom"10m"to"240m"incatalogs.goIntervalduration from10 * time.Minuteto240 * time.MinuteBenefits
Trade-offs
spec.imagefieldRelated Issues
Testing
Summary by CodeRabbit